Skip to content

http.BodyWriter: handle EOF in chunkedSendFile, simplify#24864

Merged
andrewrk merged 2 commits into
ziglang:masterfrom
ifreund:fix-std-cmd
Aug 16, 2025
Merged

http.BodyWriter: handle EOF in chunkedSendFile, simplify#24864
andrewrk merged 2 commits into
ziglang:masterfrom
ifreund:fix-std-cmd

Conversation

@ifreund

@ifreund ifreund commented Aug 15, 2025

Copy link
Copy Markdown
Member

http.BodyWriter: handle EOF in chunkedSendFile, simplify

With these changes, the zig std command now works again and doesn't trigger assertion failures or mess up the chunked transfer encoding.

Closes #24760
Closes #24859

Comment thread lib/std/http.zig Outdated
sendfile_limit.min(limit),
) catch |err| switch (err) {
error.EndOfStream => {
chunked.* = .{ .chunk_len = 0 };

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be based on observing how many bytes were read from file_reader? something like const before = file_reader.logicalPos(); ...; const after = file_reader.logicalPos(); const n = after - before;

and then we still would need to do the w.consume(n) logic

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess that would potentially give us more robust assertions but I'm not sure there would be any behavioral difference. My understanding of the sendFile interface is that EndOfStream is only returned by the writer when all bytes have been read from the reader.

Perhaps I've misunderstood that though? I'll defer to your judgement here, I'm still learning the new interfaces (which is why I thought it would be nice to fix some bugs to enhance my understanding).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before you change it, let me think about it a bit more...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the sendFile interface is that EndOfStream is only returned by the writer when all bytes have been read from the reader.

That's correct:

pub const FileError = error{
...
    /// Reached the end of the file being read.
    EndOfStream,
...
};

The question is whether chunk_len might be greater than the size of the file. We can see it is calculated like this:

        const data_len = Writer.countSendFileLowerBound(w.end, file_reader, limit) orelse {
            // If the file size is unknown, we cannot lower to a `sendFile` since we would
            // have to flush the chunk header before knowing the chunk length.
            return error.Unimplemented;
        };

...which is based on reported size from stat combined with any already-buffered data.

In theory, larger chunk sizes could be emitted if the stream implementation somehow knew there would be more data after this, but in practice, it does not. So I think your direct setting of chunk_len to zero is indeed behaviorally equivalent.

However, still 2 problems:

  • this is nontrivial and so should be asserted that these values equal each other
  • the consume() call still needs to happen to account for consumed buffered data

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, thinking about this even more... why does sendFileHeader have EndOfStream in the error set? I think that should be handled in that function and then return how many bytes were read, since it already supports short reads.

That's the difference between the "limit" functions and "exact" functions - "limit" functions can return short and do not EndOfStream. "exact" functions take an integer and return EndOfStream if that many bytes cannot be read.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the doc comment for the sendFile function in the vtable seems to suggest that error.EndOfStream was not the intended way to signal stream end originally:

    /// Number of bytes returned may be zero, which does not indicate stream
    /// end. A subsequent call may return nonzero, or signal end of stream via
    /// `error.WriteFailed`. Caller may check `file_reader` state
    /// (`File.Reader.atEnd`) to disambiguate between a zero-length read or
    /// write, and whether the file reached the end.

I'm going to experiment with deleting EndOfStream from Writer.FileError.

@ifreund ifreund force-pushed the fix-std-cmd branch 2 times, most recently from 0e0277c to 885e737 Compare August 16, 2025 19:57
With these changes, the `zig std` command now works again and doesn't
trigger assertion failures or mess up the chunked transfer encoding.
@ifreund

ifreund commented Aug 16, 2025

Copy link
Copy Markdown
Member Author

@andrewrk Let me know what you think of this version, it now fixes all known regressions with the zig std command as far as I can tell. I think the new assert(file_reader.atEnd()) is pretty strong, and I managed to get rid of the whole offset state thing without seeming to break anything.

I explored deleting EndOfStream from all error sets in Writer but backed out the change for this branch as the patch got too invasive and it did not seem like a good idea to lump in with a bugfix.

Tests pass with -Dskip-release -Dskip-non-native for me locally :)

@ifreund ifreund requested a review from andrewrk August 16, 2025 20:21
@ifreund ifreund changed the title http.BodyWriter: fix assert fail in endChunkedUnflushed() http.BodyWriter: handle EOF in chunkedSendFile, simplify Aug 16, 2025

@andrewrk andrewrk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this code is in its final form, but this is clearly a step in the right direction, so let's merge it!

Comment thread lib/std/http.zig
chunked.chunk_len = chunk_len - n;
return w.consume(n);
},
1 => unreachable,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really unreachable? it looks like it could be reached from line 970 setting chunk_len to 1, or line 987 setting chunk_len to 1.

@ifreund ifreund Aug 17, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is unreachable. It is in fact an assertion that we do not violate the protocol by e.g. writing only \n rather than \r\n or writing more data than specified in the chunk header. I do see a way to make this code a bit more clear though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #24884 to further improve the clarity of this code and document the assertions better.

Comment thread lib/std/http.zig
Comment on lines -787 to -801
.end, .none, .content_length => return out.flush(),
.chunked => |*chunked| switch (chunked.*) {
.offset => |offset| {
const chunk_len = out.end - offset - chunk_header_template.len;
if (chunk_len > 0) {
writeHex(out.buffer[offset..][0..chunk_len_digits], chunk_len);
chunked.* = .{ .chunk_len = 2 };
} else {
out.end = offset;
chunked.* = .{ .chunk_len = 0 };
}
try out.flush();
},
.chunk_len => return out.flush(),
},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to give you some background for why this was done: the idea was that when flushing a chunked http transfer request, it should finish the chunk and send the end-of-chunk trailer and start a new chunk. Looking at it with fresh eyes, I'm not sure how valuable that is, although I can think of some reasons for it.

anyway I think the simplification that you brought is nice, so let's run with that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this offset code in flush was actually unreachable before my commit.

I don't think it's possible to implement what you describe either. We can't finish the chunk early without writing the full amount of data specified in the chunk header, and we don't have access to data that the user of the API started but did not finish writing here in flush().

@andrewrk andrewrk merged commit 623290e into ziglang:master Aug 16, 2025
11 of 14 checks passed
@ifreund ifreund deleted the fix-std-cmd branch August 17, 2025 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using zig std shows runtime error zig std regression

2 participants